115 research outputs found
A Lower Bound for the Optimization of Finite Sums
This paper presents a lower bound for optimizing a finite sum of
functions, where each function is -smooth and the sum is -strongly
convex. We show that no algorithm can reach an error in minimizing
all functions from this class in fewer than iterations, where is a
surrogate condition number. We then compare this lower bound to upper bounds
for recently developed methods specializing to this setting. When the functions
involved in this sum are not arbitrary, but based on i.i.d. random data, then
we further contrast these complexity results with those for optimal first-order
methods to directly optimize the sum. The conclusion we draw is that a lot of
caution is necessary for an accurate comparison, and identify machine learning
scenarios where the new methods help computationally.Comment: Added an erratum, we are currently working on extending the result to
randomized algorithm
Distributed Delayed Stochastic Optimization
We analyze the convergence of gradient-based optimization algorithms that
base their updates on delayed stochastic gradient information. The main
application of our results is to the development of gradient-based distributed
optimization algorithms where a master node performs parameter updates while
worker nodes compute stochastic gradients based on local information in
parallel, which may give rise to delays due to asynchrony. We take motivation
from statistical problems where the size of the data is so large that it cannot
fit on one computer; with the advent of huge datasets in biology, astronomy,
and the internet, such problems are now common. Our main contribution is to
show that for smooth stochastic problems, the delays are asymptotically
negligible and we can achieve order-optimal convergence results. In application
to distributed optimization, we develop procedures that overcome communication
bottlenecks and synchronization requirements. We show -node architectures
whose optimization error in stochastic problems---in spite of asynchronous
delays---scales asymptotically as \order(1 / \sqrt{nT}) after iterations.
This rate is known to be optimal for a distributed system with nodes even
in the absence of delays. We additionally complement our theoretical results
with numerical experiments on a statistical machine learning task.Comment: 27 pages, 4 figure
Dual Averaging for Distributed Optimization: Convergence Analysis and Network Scaling
The goal of decentralized optimization over a network is to optimize a global
objective formed by a sum of local (possibly nonsmooth) convex functions using
only local computation and communication. It arises in various application
domains, including distributed tracking and localization, multi-agent
co-ordination, estimation in sensor networks, and large-scale optimization in
machine learning. We develop and analyze distributed algorithms based on dual
averaging of subgradients, and we provide sharp bounds on their convergence
rates as a function of the network size and topology. Our method of analysis
allows for a clear separation between the convergence of the optimization
algorithm itself and the effects of communication constraints arising from the
network structure. In particular, we show that the number of iterations
required by our algorithm scales inversely in the spectral gap of the network.
The sharpness of this prediction is confirmed both by theoretical lower bounds
and simulations for various networks. Our approach includes both the cases of
deterministic optimization and communication, as well as problems with
stochastic optimization and/or communication.Comment: 40 pages, 4 figure
A Contextual Bandit Bake-off
Contextual bandit algorithms are essential for solving many real-world
interactive machine learning problems. Despite multiple recent successes on
statistically and computationally efficient methods, the practical behavior of
these algorithms is still poorly understood. We leverage the availability of
large numbers of supervised learning datasets to empirically evaluate
contextual bandit algorithms, focusing on practical methods that learn by
relying on optimization oracles from supervised learning. We find that a recent
method (Foster et al., 2018) using optimism under uncertainty works the best
overall. A surprisingly close second is a simple greedy baseline that only
explores implicitly through the diversity of contexts, followed by a variant of
Online Cover (Agarwal et al., 2014) which tends to be more conservative but
robust to problem specification by design. Along the way, we also evaluate
various components of contextual bandit algorithm design such as loss
estimators. Overall, this is a thorough study and review of contextual bandit
methodology
Optimal Allocation Strategies for the Dark Pool Problem
We study the problem of allocating stocks to dark pools. We propose and
analyze an optimal approach for allocations, if continuous-valued allocations
are allowed. We also propose a modification for the case when only
integer-valued allocations are possible. We extend the previous work on this
problem to adversarial scenarios, while also improving on their results in the
iid setup. The resulting algorithms are efficient, and perform well in
simulations under stochastic and adversarial inputs
- …